[AMD][MI35X] 0612 DSV4#1715
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
This comment was marked as outdated.
This comment was marked as outdated.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27406714195 |
|
@functionstackx could you please review this? |
|
yes. lgtm once passes and then i will do /reuse on it https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27475314602/job/81213395277?pr=1715 |
|
hi @1am9trash it seems like conc |
|
hi @1am9trash it seems like conc512 is failing, can u take a look? (will cancel the rest of the conc for now to avoid clogging up the CI queue since conc512 failed already)
|
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27475314602 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27475314602 |
|
Hi, @functionstackx
My assumption is that there were other workloads on the server competing for resources at the time, and therefore the failure is unrelated to the v4 testing changes introduced in this PR. I reran the task (conc=512), and it completed successfully without encountering the issue. Thanks. |


Successful run:
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27406714195
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27406714195
Change:
Note
Low Risk
Benchmark and serving-flag tweaks for one AMD DSv4 recipe; no auth, data, or core app logic changes.
Overview
Updates the dsv4-fp4-mi355x-sglang CI/benchmark pin to
lmsysorg/sglang-rocm:v0.5.13-rocm720-mi35x-20260612(fromv0.5.12.post1-…-20260610), pairing the new image with the MoEintermediate_padfix in upstream sglang PR#27858.In
dsv4_fp4_mi355x_sglang.sh, chunked prefill is no longer hard-coded to8192: it defaults to 8192 and scales to8192 * TPwhenDP_ATTENTIONis enabled, so TP8/DP8 runs use the intended prefill chunk size.perf-changelog.yamlrecords the image bump and both fixes.Reviewed by Cursor Bugbot for commit 8d20601. Bugbot is set up for automated code reviews on this repo. Configure here.